AITopics | synthetic identity

Collaborating Authors

synthetic identity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Interpolating Speaker Identities in Embedding Space for Data Expansion

Liu, Tianchi, Tao, Ruijie, Wang, Qiongqiong, Jiang, Yidi, Sailor, Hardik B., Zhang, Ke, Lin, Jingru, Li, Haizhou

arXiv.org Artificial IntelligenceAug-27-2025

The success of deep learning-based speaker verification systems is largely attributed to access to large-scale and diverse speaker identity data. However, collecting data from more identities is expensive, challenging, and often limited by privacy concerns. To address this limitation, we propose INSIDE (Interpolating Speaker Identities in Embedding Space), a novel data expansion method that synthesizes new speaker identities by interpolating between existing speaker embeddings. Specifically, we select pairs of nearby speaker embeddings from a pretrained speaker embedding space and compute intermediate embeddings using spherical linear interpolation. These interpolated embeddings are then fed to a text-to-speech system to generate corresponding speech waveforms. The resulting data is combined with the original dataset to train downstream models. Experiments show that models trained with INSIDE-expanded data outperform those trained only on real data, achieving 3.06\% to 5.24\% relative improvements. While INSIDE is primarily designed for speaker verification, we also validate its effectiveness on gender classification, where it yields a 13.44\% relative improvement. Moreover, INSIDE is compatible with other augmentation techniques and can serve as a flexible, scalable addition to existing training pipelines.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.1921

Country: Asia > China (0.29)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.89)

Technology:

Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Hybrid Generative Fusion for Efficient and Privacy-Preserving Face Recognition Dataset Generation

Li, Feiran, Xu, Qianqian, Bao, Shilong, Han, Boyu, Yang, Zhiyong, Huang, Qingming

arXiv.org Artificial IntelligenceAug-19-2025

In this paper, we present our approach to the DataCV ICCV Challenge, which centers on building a high-quality face dataset to train a face recognition model. The constructed dataset must not contain identities overlapping with any existing public face datasets. To handle this challenge, we begin with a thorough cleaning of the baseline HSFace dataset, identifying and removing mislabeled or inconsistent identities through a Mixture-of-Experts (MoE) strategy combining face embedding clustering and GPT-4o-assisted verification. We retain the largest consistent identity cluster and apply data augmentation up to a fixed number of images per identity. To further diversify the dataset, we generate synthetic identities using Stable Diffusion with prompt engineering. As diffusion models are computationally intensive, we generate only one reference image per identity and efficiently expand it using Vec2Face, which rapidly produces 49 identity-consistent variants. This hybrid approach fuses GAN-based and diffusion-based samples, enabling efficient construction of a diverse and high-quality dataset. To address the high visual similarity among synthetic identities, we adopt a curriculum learning strategy by placing them early in the training schedule, allowing the model to progress from easier to harder samples. Our final dataset contains 50 images per identity, and all newly generated identities are checked with mainstream face datasets to ensure no identity leakage. Our method achieves \textbf{1st place} in the competition, and experimental results show that our dataset improves model performance across 10K, 20K, and 100K identity scales. Code is available at https://github.com/Ferry-Li/datacv_fr.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.10672

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SIG: A Synthetic Identity Generation Pipeline for Generating Evaluation Datasets for Face Recognition

Nzalasse, Kassi, Raj, Rishav, Laird, Eli, Clark, Corey

arXiv.org Artificial IntelligenceSep-17-2024

As Artificial Intelligence applications expand, the evaluation of models faces heightened scrutiny. Ensuring public readiness requires evaluation datasets, which differ from training data by being disjoint and ethically sourced in compliance with privacy regulations. The performance and fairness of face recognition systems depend significantly on the quality and representativeness of these evaluation datasets. This data is sometimes scraped from the internet without user's consent, causing ethical concerns that can prohibit its use without proper releases. In rare cases, data is collected in a controlled environment with consent, however, this process is time-consuming, expensive, and logistically difficult to execute. This creates a barrier for those unable to conjure the immense resources required to gather ethically sourced evaluation datasets. To address these challenges, we introduce the Synthetic Identity Generation pipeline, or SIG, that allows for the targeted creation of ethical, balanced datasets for face recognition evaluation. Our proposed and demonstrated pipeline generates high-quality images of synthetic identities with controllable pose, facial features, and demographic attributes, such as race, gender, and age. We also release an open-source evaluation dataset named ControlFace10k, consisting of 10,008 face images of 3,336 unique synthetic identities balanced across race, gender, and age, generated using the proposed SIG pipeline. We analyze ControlFace10k along with a non-synthetic BUPT dataset using state-of-the-art face recognition algorithms to demonstrate its effectiveness as an evaluation tool. This analysis highlights the dataset's characteristics and its utility in assessing algorithmic bias across different demographic groups.

controlface10k, dataset, synthetic identity, (12 more...)

arXiv.org Artificial Intelligence

2409.08345

Country:

North America > United States > California (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
(130 more...)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

5 ways AI is detecting and preventing identity fraud

#artificialintelligenceAug-20-2022, 01:10:28 GMT

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! The rise in identity fraud has set new records in 2022. This was put in motion by fraudulent SBA loan applications totaling nearly $80 billion being approved, and the rapid rise of synthetic identity fraud. Almost 50% of Americans became victims of identity fraud between 2020 and 2022.

identity fraud, synthetic identity, synthetic identity fraud, (14 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.16)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.84)

Add feedback

Deepfakes in finance: a threat to be wary of? - FinTech Futures

#artificialintelligenceNov-3-2020, 14:10:30 GMT

Since the start of the COVID-19 crisis, the number of fraud cases have continued to grow. In late June, over £16 million was lost to online shopping fraud during lockdown according to Action Fraud. From posing as government officials to online TV subscription services, fraudsters are trying every way they can to entice people for their personal details and prey on their hard-earned savings. Now, the latest weapon fraudsters are adding to their arsenal is synthetic identity fraud. Fraudsters are turning to synthetic identities to open new accounts.

artificial intelligence, machine learning, synthetic identity, (13 more...)

#artificialintelligence

Country:

North America > United States (0.06)
Europe (0.06)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (0.92)
Information Technology > Artificial Intelligence > Vision (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)
Information Technology > e-Commerce > Financial Technology (0.40)

Add feedback